Search CORE

84 research outputs found

Is it ethical to avoid error analysis?

Author: García-Martín Eva
Lavesson Niklas
Publication venue
Publication date: 01/01/2017
Field of study

Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to understand under what conditions the model is not working as expected. We focus on the ethical implications of avoiding error analysis, from a falsification of results and discrimination perspective. Finally, we show different ways to approach error analysis in non-interpretable machine learning algorithms such as deep learning.Comment: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017

arXiv.org e-Print Archive

Blekinge Institute of Technology

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Learning Machine Learning: A Case Study

Author: Niklas Lavesson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Social shaping of digital publishing: exploring the interplay between culture and technology

Author: Baptista Ana Alice
Brito Miguel A.
Lavesson Niklas
Linde Peter
Publication venue: 'IOS Press'
Publication date: 01/06/2012
Field of study

The processes and forms of electronic publishing have been changing since the advent of the Web. In recent years, the open access movement has been a major driver of scholarly communication, and change is also evident in other fields such as e-government and e-learning. Whilst many changes are driven by technological advances, an altered social reality is also pushing the boundaries of digital publishing. With 23 articles and 10 posters, Elpub 2012 focuses on the social shaping of digital publishing and explores the interplay between culture and technology. This book contains the proceedings of the conference, consisting of 11 accepted full articles and 12 articles accepted as extended abstracts. The articles are presented in groups, and cover the topics: digital scholarship and publishing; special archives; libraries and repositories; digital texts and readings; and future solutions and innovations. Offering an overview of the current situation and exploring the trends of the future, this book will be of interest to all those whose work involves digital publishing

Universidade do Minho: RepositoriUM

Detecting ditches using supervised learning on high-resolution digital elevation models

Author: Andersson Filip
Flyckt Jonatan
Lavesson Niklas
Nilsson Liselott
Ågren Anneli
Publication venue
Publication date: 01/01/2022
Field of study

Drained wetlands can constitute a large source of greenhouse gas emissions, but the drainage networks in these wetlands are largely unmapped, and better maps are needed to aid in forest production and to better understand the climate consequences. We develop a method for detecting ditches in high resolution digital elevation models derived from LiDAR scans. Thresholding methods using digital terrain indices can be used to detect ditches. However, a single threshold generally does not capture the variability in the landscape, and generates many false positives and negatives. We hypothesise that, by combining the digital terrain indices using supervised learning, we can improve ditch detection at a landscape-scale. In addition to digital terrain indices, additional features are generated by transforming the data to include neighbouring cells for better ditch predictions. A Random Forests classifier is used to locate the ditches, and its probability output is processed to remove noise, and binarised to produce the final ditch prediction. The confidence interval for the Cohen’s Kappa index ranges [0.655 , 0.781] between the evaluation plots with a confidence level of 95%. The study demonstrates that combining information from a suite of digital terrain indices using machine learning provides an effective technique for automatic ditch detection at a landscape-scale, aiding in both practical forest management and in combatting climate change

Blekinge Institute of Technology

Epsilon Open Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey

Author: Alves Antonio Pedro Santos
Azevedo Kelly
Baldassarre Teresa
Biffl Stefan
Escovedo Tatiana
Felderer Michael
Giray Görkem
Gorschek Tony
Kalinowski Marcos
Lavesson Niklas
Lopes Helio
Mendez Daniel
Musil Jürgen
Villamizar Hugo
Wagner Stefan
Publication venue
Publication date: 10/10/2023
Field of study

Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case studies with limited generalizability. We conducted an international survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems. We gathered 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems involving open and axial coding procedures. We found significant differences in RE practices within ML projects. For instance, (i) RE-related activities are mostly conducted by project leaders and data scientists, (ii) the prevalent requirements documentation format concerns interactive Notebooks, (iii) the main focus of non-functional requirements includes data quality, model reliability, and model explainability, and (iv) main challenges include managing customer expectations and aligning requirements with data. The qualitative analyses revealed that practitioners face problems related to lack of business domain understanding, unclear goals and requirements, low customer engagement, and communication issues. These results help to provide a better understanding of the adopted practices and of which problems exist in practical environments. We put forward the need to adapt further and disseminate RE-related practices for engineering ML-enabled systems.Comment: Accepted for Publication at PROFES 202

arXiv.org e-Print Archive

Efficient document image binarization using heterogeneous computing and parameter tuning

Author: AV Goldberg
B Gatos
B Gatos
B Su
BM Singh
BM Singh
BM Singh
DA Patterson
DM Greig
Florian Westphal
Håkan Grahn
I Dinç
IK Kim
J Canny
J Edmonds
J Sauvola
JE Stone
L Breiman
LR Ford Jr
M Cheriet
M Rusiñol
N Otsu
Niklas Lavesson
NR Howe
P Kohli
RG Mesquita
RG Mesquita
RJ Anderson
RM Haralick
T Hothorn
W Niblack
X Chen
Y Boykov
Y Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Evaluation of classifier performance and the impact of learning algorithm parameters

Author: Lavesson Niklas
Publication venue: Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap
Publication date: 01/01/2003
Field of study

Much research has been done in the fields of classifier performance evaluation and optimization. This work summarizes this research and tries to answer the question if algorithm parameter tuning has more impact on performance than the choice of algorithm. An alternative way of evaluation; a measure function is also demonstrated. This type of evaluation is compared with one of the most accepted methods; the cross-validation test. Experiments, described in this work, show that parameter tuning often has more impact on performance than the actual choice of algorithm and that the measure function could be a complement or an alternative to the standard cross-validation tests

Blekinge Institute of Technology

Om den Måttbaserade Ansatsen till Övervakad Inlärning av Begrepp

Author: Lavesson Niklas
Publication venue: Ronneby : Blekinge Institute of Technology
Publication date: 01/01/2008
Field of study

A classifier is a piece of software that is able to categorize objects for which the class is unknown. The task of automatically generating classifiers by generalizing from examples is an important problem in many practical applications. This problem is often referred to as supervised concept learning, and has been shown to be relevant in e.g. medical diagnosis, speech and handwriting recognition, stock market analysis, and other data mining applications. The main purpose of this thesis is to analyze current approaches to evaluate classifiers as well as supervised concept learners and to explore possible improvements in terms of alternative or complementary approaches. In particular, we investigate the metric-based approach to evaluation as well as how it can be used when learning. Any supervised concept learning algorithm can be viewed as trying to generate a classifier that optimizes a specific, often implicit, metric (this is sometimes also referred to as the inductive bias of the algorithm). In addition, different metrics are suitable for different learning tasks, i.e., the requirements vary between application domains. The idea of metric-based learning is to both make the metric explicit and let it be defined by the user based on the learning task at hand. The thesis contains seven studies, each with its own focus and scope. First, we present an analysis of current evaluation methods and contribute with a formalization of the problems of learning, classification and evaluation. We then present two quality attributes, sensitivity and classification performance, that can be used to evaluate learning algorithms. To demonstrate their usefulness, two metrics for these attributes are defined and used to quantify the impact of parameter tuning and the overall performance. Next, we refine an approach to multi-criteria classifier evaluation, based on the combination of three metrics and present algorithms for calculating these metrics. In the fourth study, we present a new method for multi-criteria evaluation, which is generic in the sense that it only dictates how to combine metrics. The actual choice of metrics is application-specific. The fifth study investigates whether or not the performance according to an arbitrary application-specific metric can be boosted by using that metric as the one that the learning algorithm aims to optimize. The subsequent study presents a novel data mining application for preventing spyware by classifying End User License Agreements. A number of state-of-the-art learning algorithms are compared using the generic multi-criteria method. Finally, in the last study we describe how methods from the area of software engineering can be used to solve the problem of selecting relevant evaluation metrics for the application at hand.Den tekniska utvecklingen har förändrat vår livsstil och utökat den globala ekonomins fokus från produktion av varor till insamling och förädling av information. En konsekvens av denna förändring är att vi blir mer beroende av databaser för lagring och databehandling. Antalet, och speciellt storleken på, databaserna växer snabbt, vilket gör det allt svårare att extrahera användbar information. Tekniker och metoder från s.k. informationsutvinning (Eng: data mining) har visat sig väl lämpade för denna uppgift såväl inom industri som inom flera vetenskapliga och tekniska områden. Informationsutvinning, eller kunskapsupptäckande, är ett tvärvetenskapligt område med anknytning till artificiell intelligens, statistik, databasteknik och datorsystemteknik. Områdets syfte är att utveckla kunskap och metoder för att kunna extrahera användbar information från stora mängder data. En vanligt förekommande uppgift är att utvinna information som kan användas för att beskriva olika typer av objekt eller händelser. Denna information kan sedan användas för att kategorisera dessa objekt eller händelser. Om den utvunna informationen kan användas för att sortera data under ett begränsat antal kategorier antyder detta att den innehåller en generell beskrivning av varje kategori. Inom lärande system, ett datavetenskapligt område med nära anknytning till artificiell intelligens, har man tagit fram metoder som är särskilt användbara för att automatiskt generera kategoribeskrivningar genom att generalisera från redan kategoriserad data. Denna typ av metoder går under benämningen övervakad inlärning av begrepp. Metoderna kallas allmänt för inlärningsalgoritmer och kategoribeskrivningarna som genereras kallas för klassificerare då de används för att klassificera, eller kategorisera data. Utvärdering av inlärningsalgoritmer och klassificerare krävs för att försäkra sig om att det studerade problemet löses tillräckligt väl av en viss metod och även för att kunna välja en lämplig inlärningsalgoritm från de många tillgängliga. Avhandlingen behandlar kritiska frågor om utvärdering av lärande system och presenterar nya ansatser och mått speciellt avsedda för utvärdering av algoritmer inom övervakad inlärning av begrepp och klassificerare. Inlärningsalgoritmer utvärderas typiskt efter hur korrekta de inlärda klassificerarna är i sin kategorisering av ny data (data som inte använts för inlärning och för vilka kategorin inte är känd sedan tidigare för inlärningsalgoritmen). Korrekthet mäts genom att man låter en klassificerare kategorisera en mängd data och sedan dividerar antalet korrekta kategoriseringar med det totala antalet kategoriseringar. Teoretiska, såväl som empiriska studier har påvisat flera brister med detta mått. Först och främst begränsas utvärderingen då endast en kvalitetsaspekt granskas. Övervakad inlärning av begrepp används inom ett brett spektra av tillämpningsområden (exempelvis: diagnos, bild- och ljudigenkänning, prediktion) och varje specifik tillämpning har sin egen uppsättning mål och krav som måste uppnås. Dessutom är det känt att korrekthet som mått inte är speciellt tillförlitligt då dess förutsättningar, som att mängden data skall vara jämnt fördelad över samtliga kategorier, sällan uppfylls i skarpa situationer. Flera alternativa mått har presenterats tidigare men få metoder existerar för att välja lämpliga mått givet ett visst problem eller en specifik tillämpning. Det centrala temat i avhandlingen är den måttbaserade ansatsen, som går ut på att skräddarsy utvärdering och inlärning för specifika tillämpningar genom att systematiskt välja passande mått baserat på en tillämpnings mål och krav. Avhandlingen presenterar bland annat en generell metod för viktad utvärdering av flera mått och föreslår en metod för systematiskt val av mått baserat på tillämpningsmål. Vidare presenterar avhandlingen en metod för måttbaserad inlärning, som går ut på att få en inlärningsalgoritm att ta hänsyn till relevanta mått redan under inlärningsfasen. Resultaten visar att denna metod ökar möjligheterna att skapa klassificerare som är specialanpassade för en viss tillämpning. Möjligheterna med måttbaserad utvärdering och inlärning bedöms som stora då informationsutvinning i allt högre grad används inom såväl forskning som näringsliv och målen med tillämpningarna varierar stort. Idag används informationsutvinningstekniker och övervakad inlärning av begrepp inom bland annat: medicin (diagnosticering av patienter samt prediktion av vårdbehov), IT-säkerhet (intrångsdetektion, sortering av skräppost, detektion av integritetskränkande programvara), video- och bildanalys (igenkänning via fingeravtryck och ansiktsdrag), automatiserad språkförståelse (kategorisering av textdokument, igenkänning av språk)

Blekinge Institute of Technology

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Publikationer från Högskolan i Jönköping

Om den Måttbaserade Ansatsen till Övervakad Inlärning av Begrepp

Author: Lavesson Niklas
Publication venue: Ronneby : Blekinge Institute of Technology
Publication date: 01/01/2008
Field of study

Digitala Vetenskapliga Arkivet - Academic Archive On-line